本文研究了多功能网络网络中分散的多武装强盗问题。当他们面对一套常见的M武器并分享每个ARM奖励的相同均值,问题是由N代理同时解决的。每个代理可以从其邻居接收信息,其中代理之间的邻居关系由其顶点代表代理的定向图描述,并且其定向边缘描绘了邻居关系。针对每个试剂提出了一种完全分散的多武装强盗算法,其曲折了经典共识算法和上置信算法(UCB)算法。结果表明,该算法保证了每个代理,以实现比邻居图强烈连接的经典UCB更好的对数渐近遗憾。如果邻居图无向,则可以进一步提高遗憾。
translated by 谷歌翻译
In the realm of multi-modality, text-guided image retouching techniques emerged with the advent of deep learning. Most currently available text-guided methods, however, rely on object-level supervision to constrain the region that may be modified. This not only makes it more challenging to develop these algorithms, but it also limits how widely deep learning can be used for image retouching. In this paper, we offer a text-guided mask-free image retouching approach that yields consistent results to address this concern. In order to perform image retouching without mask supervision, our technique can construct plausible and edge-sharp masks based on the text for each object in the image. Extensive experiments have shown that our method can produce high-quality, accurate images based on spoken language. The source code will be released soon.
translated by 谷歌翻译
深度学习最近在程序分析任务(例如错误检测)方面取得了最初的成功。缺乏真正的错误,大多数现有的作品通过将合成错误注入正确的程序来构建培训和测试数据。尽管达到了高测试精度(例如90%),但发现所得的错误检测器在实践中令人惊讶地无法使用,即用于扫描真实软件存储库时<10%的精度。在这项工作中,我们认为这种巨大的性能差异是由分布变化引起的,即实际错误分布与用于训练和评估检测器的合成错误分布之间的基本不匹配。为了应对这一关键挑战,我们建议在两个阶段训练一个错误检测器,首先是合成错误分布,以使模型适应错误检测域,然后在真实的错误分布上调整模型,以将模型驱动到真实分布。在这两个阶段中,我们利用多任务层次结构,焦点损失和对比度学习来进一步提高性能。我们对三种经过广泛研究的错误类型进行了广泛的评估,为此,我们仔细设计了新的数据集,以捕获真正的错误分布。结果表明,我们的方法实际上是有效的,并且可以成功地减轻分配的转变:我们学到的检测器在测试集和最新版本的开源存储库中都表现出色。我们的代码,数据集和模型可在https://github.com/eth-sri/learning-real-bug-detector上公开获取。
translated by 谷歌翻译
由于缺乏深度信息,单眼3D对象检测在自主驾驶中非常具有挑战性。本文提出了一种基于多尺度深度分层的单眼单目眼3D对象检测算法,它使用锚定方法检测每像素预测中的3D对象。在所提出的MDS-Net中,开发了一种新的基于深度的分层结构,以通过在对象的深度和图像尺寸之间建立数学模型来改善网络的深度预测能力。然后开发出新的角度损耗功能,以进一步提高角度预测的精度并提高训练的收敛速度。最终在后处理阶段最终应用优化的软,以调整候选盒的置信度。基蒂基准测试的实验表明,MDS-Net在3D检测中优于现有的单目3D检测方法,并在满足实时要求时进行3D检测和BEV检测任务。
translated by 谷歌翻译
我们介绍了一个目标驱动的导航系统,以改善室内场景中的Fapless视觉导航。我们的方法在每次步骤中都将机器人和目标的多视图观察为输入,以提供将机器人移动到目标的一系列动作,而不依赖于运行时在运行时。通过优化包含三个关键设计的组合目标来了解该系统。首先,我们建议代理人在做出行动决定之前构建下一次观察。这是通过从专家演示中学习变分生成模块来实现的。然后,我们提出预测预先预测静态碰撞,作为辅助任务,以改善导航期间的安全性。此外,为了减轻终止动作预测的训练数据不平衡问题,我们还介绍了一个目标检查模块来区分与终止动作的增强导航策略。这三种建议的设计都有助于提高培训数据效率,静态冲突避免和导航泛化性能,从而产生了一种新颖的目标驱动的FLASES导航系统。通过对Turtlebot的实验,我们提供了证据表明我们的模型可以集成到机器人系统中并在现实世界中导航。视频和型号可以在补充材料中找到。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译